Experiment - smallier friendlier openerer research
Things: - Drop in EScoe reference to existing work on uncertainty - Say: yes, lots of this will cause actual statisticians to have kittens, but (a) the basic point remains and (b) the stats can be improved / fixed where it’s wrong. The central point, to repeat, is most definitely not wrong. - Better to have some error in your error than pretend no error exists at all. This will help stop you accidentally following vampires off cliff edges. (Article name: “How to avoid following vampires off cliff edges”?) - Note this ties to type I and II errors too, I think. https://www.linkedin.com/pulse/type-i-ii-error-jason-hornberger-mba/ - Tom’s image from that chat. - Can help us make decisions based on what we do know, not what we don’t. It pushes us to look for sources closer to reality. - I should state up-front, I have done a tonne of this. The lure is very strong. E.g. IMD - it looks so neat and clear. But cf. the issue Rich Harris mentions on stats with compound measures. - So e.g. how does this affect industrial strategy, given what we do and don’t know? - “Before I carry on to look at smaller geographies, productivity etc, stopping in case I can get feedback on this approach / what I might be getting wroing etc…” - Advantage of this approach - can apply the same logic to other sources like companies house and treat those firms as a sample (which is what they are). Using that, what kind of error bars do we get at subregional level? - On the grid thing - so actually in a way, we get more information than before (cos can really see where growth happened and didn’t). - The binary ‘false alarm / missed hazard’ type 1 / 2 thing - make clear in policy action, this is why we iterate and why test and learn is such a good plan. Stats is one (possibly quite minor) tool on the motorbike handlebars that might help indicate if left or right is currently more likely to be better.
..
It’s been over two years since I was first asked to analyse sector strengths in South Yorkshire, as part of my secondment to SYMCA through Y-PERN. That work led in a whole bunch of directions; I’ve tried to keep a list of open code and outputs here.
It’s a good time to try and digest everything, mull what happens next. Inevitably I’ve ended up with more questions than answers, and it’s in that spirit I’m going to write. I’m charting my thinking process on this topic over here as I go.
What I’m aiming to do:
- Use a little open data digging to burrow into some regional economic questions, a topic at a time.
- One of my main questions is: are we using official data sources in the best way we can? If not, what might we do differently?
I will entirely steal Diane Coyle’s attitude to asking these questions: a deep respect for the phenomenally difficult work done to understand our economies by people much smarter than me. In the UK, the ONS has the unenviable task of doing this work while under intense political scrutiny of every fraction of growth, and constant funding / people-not-responding-to-surveys challenges.
So while I’m going to be questioning things, those are open questions from someone who’s not immersed daily in the vast corpus of national accounts theory. As (fictional) vice-president John Hoynes put it in the West Wing, the tonnage of what I don’t know could stun a team of oxen in its tracks.
That said… I am starting with a notion: there are problems using the national accounts framework for regional economies. These issues are all known, and talked about with varying degrees of sense, hysteria and trying to sell alternatives.
The issues all stem from the same root: national accounts methods (1) are designed to analyse nations, not regions, and (2) are accounts. They have to balance, every part of the economy has to add up correctly and consistently - including across all UK regions. This forces a national-level, top-down straitjacket on regional numbers. Let’s make a start in this post on one of the most important consequences of that:
- Accounts only report single values. It’s all they can report. Uncertainty doesn’t feature.
Through this lens, the UK grew by exactly 0.1% in the three months to November 2025. Uncertainty only enters via later revisions to those exact figures - the image here is from the ONS showing how much revisions over time differ from initial numbers. These are very much not confidence intervals showing the underlying uncertainty in the data.
And they try to do this work when chancellors and shadow chancellors shout and point fingers over every 0.1% revision. Argh.
Or: the ONS have no choice here - the national accounts straitjacket…
I want to explore these through the data, and look for some ways forward.
![[Pasted image 20260204141532.png]]
In the ONS’ method doc for regional GVA, they explicitly say:
“The complex process by which GVA estimates are produced means that it is not currently possible to define the accuracy of the estimates… for example, through their standard errors. Therefore, the reliability of the estimates is measured by the extent of revisions.”
I think ONS are doing themselves a disservice here - they could certainly handle the complexity. It’s just the ‘accounting’ point again: the assembly of national output involves so many moving parts, and the need for perfect balancing (however baroque) so built-in, there just isn’t scope to do anything else with regional economic numbers.
The uncertainty, however, hasn’t gone away. In this post, I’ll have a go at showing what difference the uncertainty could make to the numbers, if we could see it.
That’s a different (though of course linked) question from “If things are uncertain, how does that change how we should act?” I’ll come to that one in a later post (I’m just reading Manski’s Lure of Incredible Certitude to warm up for that).
This is all a slightly alarming look back at the certitude I’ve built into my own work too. It’s very difficult to resist, both methods-wise (single values is all we have, often) and what-policymakers-want-to-see-wise. One of my plots below (code here) is a prime example: GVA per hour, ITL2 zones ranked! Who’s up, who’s down on the scoreboard??
![[Pasted image 20260204150037.png]]
An old post by Prof. Richard Harris of Bristol Uni took apart the ‘certain uncertainty of university rankings’, showing just how statistically arbitrary any exact position is. We live in a world where being one place up or down from that exact position (in or out of the THES top 100, say) can make or break student income.
Whether that’s or cause or effect of thinking with single numbers, we can chew over another time. But let’s see if we can do something similar to Richard’s piece for UK regions.
(Or not attempting ranking, just some smaller conclusions…)
Known unknowns
In the movie ‘The Lost Boys’, David and his west-coast vampire buddies race motorbikes across a foggy landscape. Michael (as yet unaware of the blood-drinking habits of his new associates) tries to keep up with them. They egg him on - but then he spots the faint beam of a lighthouse, puts two and two together and barely avoids hurtling over a cliff edge.
One might say David was claiming ‘incredible certitude’ (Manski) about the cliff’s location. “Don’t worry Micheal, it’s definitely, definitely miles from here.”
If there’s fog, what you do about that depends on what you think is going on / what else you feel certain about. But the fog matters, the fog itself is information. (i.e. fog near a cliff is different info to fog walking on an open moor).
..
In this post, I’ll use error rates from the Annual Business Survey (ABS) to make an educated guess of confidence bounds around the regional/sectoral GVA data we often use to characterise our economies.
(For the ABS, point data and error data are in different places and formats - code/comments here run through the fun of combining them. Final joined CSV is here, to save others the pain, though it’s just for GVA, not turnover etc.)
The ABS is designed to capture sectoral and regional data at a reasonable resolution. And as the ONS say in their fascinating breakdown (if you’re into that kind of thing) of “observed/estimated/modelled components of regional GVA”:
Of all the data sources used in regional GVA, the ABS has the greatest overall impact, representing around 71% of GVA(P) and 22% of GVA(I)… It includes elements corresponding to all three of the categories of data we wish to analyse: directly collected from businesses operating in a single region; weighted to represent non-sampled businesses; and apportioned to regions from UK-wide company information.
We get data direct from firms. As that page says:
The ABS data are not used in the agriculture or finance industries, or for the non-market public services or activities of households.
We’ll come back to those exclusions in later posts. But here, it means we’re using actual survey results, not relying on some of the more [[Heroic assumptions|heroic]] estimation/modelling assumptions that go into other aspects of GVA.
Here’s what I’ve done. - Used two sources for GVA output of (2-digit SIC) sectors in ITL1 regions (ye olde Government Office Regions): - The chained volume data (inflation-adjusted / deflated - we’ll come back to that in a later post too, and note ‘telecoms’ below). This aims to tell us ‘real’ output changes over time. From these, we can ask, “Did this sector grow or shrink here?” We can do a few other things e.g. was its growth slope really steeper than other places? - The current prices data, which (being prices as they were in the year they were counted) can be summed, and used to look at location quotients (LQs) - how concentrated a sector is in each place. It’s where we can get a good sense of the UK’s economic structure, because it’s possible to compare sizes across the UK. We’ll have a go at adding error rates to LQs. - For both of those - for sectors where there’s a match between this and the ABS - I’ve transplanted the GVA error rates. We’re asking: “If the regional GVA data had the same error rates as the ABS, what would it look like?”
Is that a reasonable thing to do? Well - (1) it’s an “if then.” If the error rates were this, then… For those sectors that match, I think it’s a fair assumption that the ABS reasonably captures its uncertainty. Adding extra sources would be unlikely to reduce it much. And (2) the alternative is working with the point data as if it was perfectly accurate. Seeing what difference error makes - even if there’s error in our error - could well be more useful than pretending there’s none.
And as we’ll see, adding uncertainty in leads to some quite different ways of thinking about regions. So.
Growth + error
We’ll start with growth in the chained volume data, as it’s actually slightly easier to play with (the LQs below are slightly more faff).